Entry Name:  "UBA-Cesario_Picoaga-MC3"

VAST 2013 Challenge
Mini-Challenge 3: Visual Analytics for Network Situation Awareness

 

 

Team Members:

 

Diego Martin Cesario, University of Buenos Aires (UBA), diegomcesario@gmail.com PRIMARY

Jorge Kuday Picoaga, University of Buenos Aires (UBA), georgepicoaga@gmail.com (Point of contact for questions/answers)

Student Team:  YES

Analytic Tools Used:

Tableau

Excel 2010

Oracle data mining

R (packages as tau, ggplot, lattice, gridExtra, lubridate and programing)

Gephi for R

SQL Server 2008

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2013 is complete?  Yes

Video:

VAST_2013_MC3_Cesario_Picoaga.wmv 

 

 

Questions

MC3.1 – Provide a timeline (i.e., events organized in chronological order) of the notable events that occur in Big Marketing’s computer networks for the two weeks of supplied data. Use all data at your disposal to identify up to twelve events and describe them to the extent possible.  Your answer should be no more than 1000 words long and may contain up to twelve images.

The purpose for these observations was to gain an in-depth understanding of the Big Marketing network pattern and suspect activity along our Network.

1.       For a general view we decide to reproduce the complete network activity along the 2 weeks with some heatmaps and our first event is shown, no log activity between the 8th and the 9th of April, at first we thought about general outage servers during 2 days, but finally we assumed  that the lack of activity was due to the new ipslogs system implementation on the second week. Both daily and minutes time Per Gygas plot heatmaps show us the days with higher network activity. These days were the April 3rd and 14th.

Event 1: Unusual network activity on 2 days, we were able to determine what periods are the busiest.

 

 

 

 

2.       In a new attempt to plot the network activity per hour and per day, we assume the principal network activity will be shown in business hours, in a new bar plot we detect an unusual high network activity in the night in some days.

 

 

Event2:  Unusual network activity off business hours.

3.       We decide to use the bbcontent field section of the BBrother table as a metric (uptime for server by site).  This metric provides the time that the server was up since its last status reported. This metric was used to obtain information about the servers that had suspect down along the 2 weeks to get more server pattern behavior.

 

 

Event3: Some servers had more unusual reboots than others

 

 

 

4.       We decide to visualize another metric that we consider interesting to get. The disk use percentage by hostname along 2 weeks.

Event4: We show 3 servers with the message fatal error warning that means disk availability full.

-          Administrador server

-          Web03.bigmkt3.com

-          Web01.bigmkt1.com

 

 

 

5.       Due to we need to have evidence about the external attacks to the servers, We focus on messages like “disk use and down status”, thus we get this target using text mining technics. We parse the Bbcontent field of the Bbrother table implementing a simple words frequency to get a outliers records. We built a bbcontent tokenizer with words meaning dictionary and we create a frequency inverted plot and a new event was found.

Event5:  Suspect possible virus messages inside the bbcontent data.

 

Word

Frecuency

Means

unreachable

1432484

Server cannot stablish connection, intuitive alert token

Loss

1015549

perdida de algun tipo , token de alerta intuitiva

Can

477259

se refiere a can`t no se puede realizar algun proceso, intuitive alert token

Wmiprvse

50338

Sasser trojan uses this name and similar variants to pass undetected wmiprvsw.exe.

Winlogon

23505

Winlogon.exe means process used by the virus developers to hide of the network administrators.

Wininit

17788

Ussualy refers to virus attack in the beginning of the server activity (in windows environments)

vmupgradehelper

3087

Using VMUpgradeHelper.exe /r from a Windows command line (to restore NIC settings) fails with the error:Restore network config failed.

Vmwaretray

2775

vmware-tray.exe file can be a program very dangerous and ussually involves a vmware instalation.

Panic

1481

Fails detected, intuitive alert token

Failed

637

Fails detected, intuitive alert token

Rebooted

238

To turn (a computer or operating system) off and then on again; restart, intuitive alert token

Slui

67

slui.exe refers to application which is installed when Windows is not genuine.

msexchangemailboxassistants

13

msexchangemailboxassistants.exe high cpu and network for specific user

Msftesql

2

The free file information forum can help you determine if msftesql.exe is a virus, trojan, spyware, or adware that you can remove, or a file belonging to a Windows system or an application you can trust.

Wlrmdr

2

A wlrmdr.exe file should only be located in the Windows\system32 folder of your pc as this is the default path where this file is designed to execute from. Unfortunately, some of the undetected spyware may be responsible for the wlrmdr.exe errors occurring on your computer by placing a file with the similar name. Moreover, an uninstall of a program that has been performed incorrectly or incompletely may also lead to the wlrmdr.exe errors.

antispamupdatesvc

1

"microsoft.exchange.antispamupdatesvc.exe" process can be represent a threat or a virus

Conhost

1

C:\WINDOWS\SYSTEM32\CONHOST.EXE always attempts to access my computer. So, it appears that malware has somehow externally programmed an attack to take place 4 minutes after I log in on my computer

 

 

 

6.       In the second week some interesting information was processed. Taken the operation field information from the ipslogs table with value “deny” , we obtain the external ip addresses  that represent the majority of attacks in the network .

 

Event6: Suspect external ip addresses to represent attacks to our network.

 

7.       One of the suspected ip addresses is 172.10.0.6 IP which uncommonly have internal network notation not recorded in the BigMktNetwork.txt. Therefore, one of the attacks is using internal numeric notation for the site 1, according to the architecture document of the network. Enterprise Site 1.

Event7: IP address using internal IP address notation was detected therefore imply attack from the local network

 

 

8.       Now we have this uncommon IP address in our scope. All Big brother records are filtered by 171.10.0.6 IP address. This ip address was chosen because the ip range belongs to the local network, but it was not listed in the BMN list, making a table visualization (scrolling data), we can see that many parameters into the bbcontent field are repeated in multiple records. We thought that it was due to the repeated messages status on multiple hostnames. However, checking the Universe total records that was generated this uncommon local IP address (a total of 3.876.422 identical records were detected).

Event8:  The message in the fig 7 indicates in the "bbcontent_extract" field was reported by 900 different servers and involves 290.133 Big brother records causing the bulk of the traffic.

 

9.      After reading more about security, we obtained information about a very useful technique by the external hackers named “Port Knocking”, in which the hacker attempts to do a ping instruction to several ports until the firewall give up. We detect many ip addresses receiving messages with no identified port by clustering technic.

Event9: Unknown ports identified using a “port knocking” technic.

 

 

10.   The web server WEB03.BIGMKT3.COM 172.30.0.4 (with external IP address 10.0.4.4), reports the entire 4th April, no connection status until April 5th at 08:00am.

Event10: Specific web server in site 3 is reported with down status.

 

 

 

11.   In the network description that reports the next paragraph: Organizationally, Big Marketing consists of three different branches, each with around 400 employees and its own web servers. Therefore the BigMktnetwork.txt 408 IP address that involves site 1 are reported, 407 IP address that involves site 3 are reported and suspiciously 308 that involves the site 2. Thus.

Event11: BigBrother network health monitoring program reports 100 workstations status which was not declared in the BigMktnetwork.txt list. The range of these hostnames involves interval between wss2-101 to wss2-200.

 

 

 

 

12.   With reference to 11th event, there are 11 ip address in which numerical notation begin with 172 prefix, this prefix is supposed be part of the internal big marketing network but checking the network architecture document provided, this suspect ip addresses range are not included in the architecture, of these 11 ip address is reported the ip address issue of the event 8th.

Event12: External attackers want to clone internal ip addresses.

 

MC3.2 – Speculate on one or more narratives that describe the events on the network. Provide a list of analytic hypotheses and/or unanswered questions about the notable events. In other words, if you were to hand off your timeline to an analyst who will conduct further investigation, what confirmations and/or answers would you like to see in their report back to you? Your answer should be no more than 300 words long and may contain up to three additional images.

1. On our whole experience, we assume that the principal threat under our network is represented by the internal 172.10.0.6 IP address.  By this theory we assume that this attack occurs inside the Big Brother installations, running some process as scheduled tasks during off business hours, we would like more information about this IP address to keep tracing on it.

2. If we had known about scheduled reboots, we would have found out that the lack of information between the days 7 and 10 April therefore we can get a biased universe to investigate about the server reboots.

 3. We would like get contact with the network maintenance responsible person, is very strange which no action were taken under multiple automatic alerts about usage disk percent usage. Due to this administrator server get the 100% used and 2 more servers’ reports over the 90% use percent disk.

4. We can speculate, in order to explain our experiments, that the agent system log that reports each 5 minutes status of the servers has been infected by some external attack due to were found fuzzy repeated information messages in their content. System logs are one of the main sources of detecting intrusions, therefore we need to secure that status log must Works correctly.

5. It can be very useful obtain a complete report of the scheduled activity in other tan office hours (like big data y business intelligent OLAP processes) to discard like suspect activity into the network.

6. Although network description  inform us three different branches, each with around 400 employees and its own web servers, for the site 2 only 308 workstations are listed in the BigBrother.txt file, on the other hand, in the BBrother´s records 100 hostnames with site 2 range but are not listed in the file.txt. Assuming the all the employees have their own workstation; it is possible that incomplete architecture network list is incomplete.

 

MC3.3 – Describe the role that your visual analytics played in enabling discovery of the notable events in MC3.1. Describe whether your visual analytics play a role in formulating the questions in MC3.2. Your answer should be no more than 300 words long and may contain up to three additional images.

Efficient information visualization is an important element required for urgent detection of intruders. The conventional way of browsing system logs does not provide immediate action against unauthorized server entries. We propose a portfolio made by Tableau software and R for administrators to easily identify and quickly act upon intrusions in the Big Brother network. Information can be viewed easily and detection of intrusion will allow administrators immediate engagement to secure a 100% availability of the network based on the visualization of the resources which is under attack. We want to make the proposal to build a web application with all our metrics integration to provide easy detection of the unusual activity of the servers.

We made the effort of exploring the possibility of refining visualization techniques that are able to represent detailed information of each server and their behaviors for effective visualization. In a first time R software was chosen to plot our analysis, the plots were discard by us and we decide to use tableau software.  Some visualization using concentric circles so that server disk use can be seen in our work. The concept and the system we have presented shows that the importance of integrating Information Visualization in Intrusion Detection can been indispensable for detecting possible intrusions within a network.

Heatmaps in the modern visualization topics offers a simple yet powerful way of displaying the distribution of  time series big data. Heatmaps use colors to represent the density of points, making it easier to pick out areas of high activity.

Dashboard detecting not identified ports in clustering processing for all the complete servers. Orange area indicates the not identified ports.

For both MC3.1 and  MC3.2 we consider the field of information visualization is important due to having a  interesting scope as this case, is motivating in the discover models process. On the other hand visualizing in MC3.2 play a important role to get more questions drawing time series plot cross variable information.  It is possible the questions and doubs which typically are included as traditional evaluation criteria after our visualization be increased because hide patterns are shown.